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SYNTHESIS  OF  SELF-TIMED  CIRCUITS  BY  PROGRAM  TRANSFORMATION* 


Steven  M.  Bums  and  Alain  J.  Martin 

Computer  Science  Department 
California  Institute  of  Technology 
Pasadena,  CA  91125  USA 

Self-timed  circuits  can  be  synthesized  from  concurrent  programs  in  two  logically 
separate  phases.  First,  through  a  series  of  program  transformations,  the  source 
program  is  decomposed  into  an  equivalent  program  constructed  entirely  from  in¬ 
stances  of  basic  processes.  These  basic  processes  correspond  to  the  syntactic  con¬ 
structs  of  the  source  language.  The  remainder  of  the  synthesis  procedure  consists 
of  compiling  each  of  the  basic  processes  into  a  self-timed  circuit  using  techniques 
described  in  earlier  papers.  These  compilations  need  to  be  done  only  once.  This 
paper  describes  in  detail  the  program  transformations  used  in  an  automated  syn¬ 
thesis  procedure  developed  at  Caltech.  The  transformations  used  are  applications 
of  process  decomposition ,  a  simple  technique  that  is  easy  to  verify.  The  circuits 
synthesized  by  these  program  transformations  are  correct  by  construction;  thus, 
this  technique  provides  a  simple  method  for  constructing  provably  correct  circuits 
from  a  high-level  specification. 

We  propose  a  method  for  developing  VLSI  circuits  from  an  abstract  specification.  The 
programmer  designs  a  concurrent  program  that  meets  this  specification,  and  then  an 
automatic  mechanism  transforms  the  program  into  a  circuit.  The  programmer’s  proof 
obligation  is  limited  to  verifying  that  the  concurrent  program  is  an  implementation  of 
the  specification.  The  program-to-circuit  transformation  is  verified  separately. 

Program  transformations  within  the  source  language  provide  a  powerful  tool  for  de¬ 
riving  implementations  of  programs.  By  working  at  an  abstract  level  in  a  language 
with  a  well-defined  semantics,  transformations,  which  are  complex  if  performed  at  the 
circuit  level,  are  reduced  to  trivial  syntactic  manipulations.  Such  transformations  are 
both  easy  to  perform  and  easy  to  verify.  They  form  the  core  of  an  automatic  compiler 
for  synthesizing  self-timed  circuits. 

We  have  constructed  a  set  of  program  transformation  rules  that,  when  applied  to 
any  program  in  the  source  language,  transform  it  into  an  equivalent  program  of  a 
very  simple  form.  This  form  is  composed  only  of  simple  basic  processes  that  have 
already  been  compiled  into  circuits.  In  this  paper,  we  describe  in  detail  this  set  of 
program  transformations.  In  addition,  we  show  the  compiled  circuits  for  each  of  the 
basic  processes  and  the  resulting  syntax-directed  translation  rules.  We  also  introduce 
and  compare  various  schemes  for  guard  evaluation  and  then  apply  these  schemes  to  a 
simple  example. 

*to  appear  in  The  Fusion  of  Hardware  Design  and  Verification,  G.J.  Milne,  ed.,  North-Holland 
(1988) 
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1. 

(process)  : 

:= 

(  (process)  {  |  |  (process)  }  )  {  (channel)  } 

2. 

|  {  (port)  }  {  (var)  }  (sequence) 

3. 

(channel) 

:= 

channel  (  (NAME)  ,  (NAME)  ) 

4. 

(port) 

:= 

( passive  |  active  )  (NAME)  (  (INT)  ,  (INT)  ) 

5. 

(var) 

:= 

boolean  (NAME)  =  ( true  |  false  ) 

6. 

(sequence) 

:= 

(statement)  [  ;  (sequence)  ] 

7. 

(statement) 

:= 

skip 

8. 

|  (NAME)  (  up  |  down  ) 

9. 

j  (NAME)  (  (INT)  )  :  [  (responses)  ] 

10. 

1  (  t  1  *[  )  (gcs)  ] 

11. 

(responses) 

:= 

(response)  {  |  (response)  } 

12. 

(response) 

:= 

(INT)  — >  (sequence) 

13. 

(gcs) 

:= 

(gc)  {  1  (gc)  } 

14. 

(gc) 

:= 

(expr)  — >  (sequence) 

15. 

(expr) 

:= 

(conjunct)  [  or  (expr)  ] 

16. 

(conjunct) 

:= 

(primary)  [  and  (conjunct)  ] 

17. 

(primary) 

:= 

not  (primary) 

18. 

1  (  (expr)  ) 

19. 

j  (NAME) 

20. 

|  probe  (NAME) 

21. 

(true  1  false) 

Figure  1:  Backus-Naur  Form  (BNF)  for  Source  Language 


1  Source  Language 

The  source  language  is  based  on  CSP[3],  with  the  addition  of  the  probe[ 6]  and  a  new 
communication  construct.  A  complete  description  of  the  language  syntax  is  given  in 
Figure  1.  We  shall  refer  to  this  figure  when  deriving  the  individual  transformation 
rules. 

A  program  in  this  language  consists  of  a  set  of  sequential  processes  with  intercon¬ 
necting  channels.  Associated  with  each  sequential  process  is  a  set  of  ports,  a  set  of 
private  variables,  and  a  list  of  statements  to  be  executed  sequentially.  Ports  that  do 
not  connect  to  another  process  connect  to  the  environment. 

Only  boolean  variables  are  allowed.  Variables  are  changed  by  assignment  to  true 
(x  up)  or  to  false  (x  down).  The  selection  ([(gcs)])  and  repetition  (*[(gcs)])  con¬ 
structs  are  based  on  guarded  commands.  We  use  *  [(sequence)]  as  an  abbreviation 
for*  [true — >(sequence)] . 

Synchronization  between  two  processes  is  accomplished  by  zero-slack  communication 
actions  across  channels  denoted  by  pairs  of  ports.  Of  the  two  ports  that  make  up  a 
channel,  one  is  declared  active  and  the  other  is  declared  passive.  The  process  that 
owns  the  passive  port  can  determine  whether  the  other  process  is  waiting  for  a  com¬ 
munication  on  this  channel  by  evaluating  a  boolean  condition  called  a  probe.  Probes 
may  be  used  in  arbitrary  boolean  expressions. 

Though  concurrently  operating  processes  may  not  share  variables,  processes  may  com¬ 
municate  data  by  exchanging  values  from  small  sets  during  a  synchronization  action. 
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When  declaring  a  port,  we  specify  both  the  send  and  receive  sets  of  values,  each  set 
being  represented  by  a  single  integer.  For  example, 

passive  L(3,2) 

declares  a  passive  port  L  with  send  set  {0,1,2}  and  receive  set  {0,1}.  The  syntactic 
construct  for  a  communication  action  allows  different  sequences  of  commands  to  be 
executed  based  on  the  value  received  during  a  communication.  An  execution  of  the 
communication  action  (on  the  same  port,  L) 

L(l) : [  0  — >  x  down  I  1  — >  x  up  ]  , 

sends  the  value  1  and  simultaneously  receives  either  a  1  or  a  0.  If  a  0  is  received,  x 
is  set  to  false;  if  a  1  is  received,  x  is  set  to  true.  We  allow  two  abbreviations  in  the 
specification  of  a  communication  action:  The  output  value  may  be  omitted  if  the  port 
has  only  one  send  value,  and  the  receive  value  selection  may  be  omitted  if  the  port 
has  only  one  receive  value. 


2  Target  Language  —  Self-timed  Circuits 

The  target  of  the  compilation  is  a  self-timed  circuit — a  set  of  circuit  variables  (nodes) 
interconnected  by  a  set  of  operators  (gates) .  These  circuits  are  designed  to  function 
correctly  regardless  of  the  internal  delays  of  the  operators.  The  required  operator 
types  include  the  combinational  elements,  WIRE,  AND,  and  OR]  and  the  state-holding 
elements  shown  in  Figure  2.  Each  operator  is  defined  in  terms  of  a  set  of  production 
rules[4,5].  A  production  rule  is  a  simple  transition  rule  of  the  form  G  i — >  S,  where 
G  is  a  boolean  expression  and  S  is  an  assignment  to  true  or  false.  All  references 
to  a  circuit  variable  are  assumed  to  have  the  same  value  (isochronic  forks)  [1,4].  A 
synchronizer,  which  cannot  be  represented  in  terms  of  production  rules,  is  included 
to  allow  the  implementation  of  programs  with  negated  probes.  The  synchronizer,  as 
well  as  the  other  operators,  have  been  implemented  as  CMOS  standard  cells. 

Self-timed  circuit  implementations  of  concurrent  programs  are  generated  by  imple¬ 
menting  each  sequential  process  as  a  separate  sub-circuit.  The  sub-circuits  are  con¬ 
nected  (by  wire  operators)  only  to  implement  communication  actions.  The  simul¬ 
taneity  required  in  the  zero-slack  communications  is  implemented  using  a  four-phase 
handshaking  protocol.  In  order  to  implement  general  communication  actions  (those 
in  which  data  is  transmitted)  the  usual  request/acknowledge  pair  of  wires  is  replaced 
by  one  wire  for  each  send  value  and  one  wire  for  each  receive  value. 


3  Syntax-directed  Compilation 

An  arbitrary  program  in  the  source  concurrent  language  is  compiled  into  a  target 
self-timed  circuit  by  a  syntax-directed  translator,  similar  to  that  used  in  standard 
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{x,y)C{z)  =  ( 

'  v  '  I  -ia;A-ty 

=  { ^  : 

{*[[a:  a  6  — 

\xA->b  — ►»»t;h*];vi 

11 

b 

Figure  2:  State-holding  Operators 

program-to-machine-code  compilers.  Such  a  translator  requires  a  set  of  BNF  rules 
describing  the  syntax  of  the  source  language  and  a  set  of  translation  rules  describing 
how  to  construct  objects  in  the  target  language.  In  this  application,  an  object  in  the 
target  language  is  a  self-timed  circuit.  The  translation  rules  specify  how  to  generate 
and  connect  circuits  corresponding  to  the  syntactic  constructs.  The  translation  rules 
are  derived  in  two  logically  separate  phases:  program  transformation  and  basic  process 
compilation. 

3.1  Process  Decomposition 

Process  decomposition  is  the  most  commonly  used  program  transformation.  An  ar¬ 
bitrary  program  part  /?  is  replaced  by  a  single  active  communication  and  a  separate 
process  implementing  /?: 

a;/3;7  >  active  A'  passive  A  (a;  A';' y  ||  *[[A — >  /?;  A]])  channel  (A',  A)  . 

(Read  ‘  >’  as  “is  replaced  by”.)  Process  decomposition  does  not  introduce  concur¬ 
rency;  the  active  communication  A'  cannot  finish  until  A  and,  thus,  f3  complete.  The 
original  process  and  the  new  process  may  share  variables  and  ports.  These  two  pro¬ 
cesses  are  never  active  concurrently;  thus,  exclusive  access  to  each  variable  and  port 
is  ensured.  In  the  following,  we  do  not  explicitly  declare  the  ports  and  channel  used 
in  a  process  decomposition.  The  two  ports  of  a  channel  are  denoted  by  the  same  cap¬ 
ital  letter.  The  primed  letter  represents  the  active  port.  We  write  the  above  process 
decomposition  as: 

(x'iPw  >a;A,;/y  ||  (A/f3)  . 

A  more  general  form  of  process  decomposition  is  used  to  implement  constructs  involv¬ 
ing  guard  evaluation.  An  evaluation  construct  may  be  implemented  in  a  separate  pro¬ 
cess,  and  if  this  is  the  case,  the  multi-valued  result  of  the  evaluation  is  communicated 
back  to  the  original  process  by  a  general  communication  action.  Such  a  decomposition 
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is  of  the  form: 


Ho  - >/?0|...hn-l  - *A»-l] 

>  active  G'(l,  n)  passive  G{n,  1) 

(G'  :J0  — >■  fo\ . . .  \n  -  1  — >•  A»-i] 

II  *\lG  — 1 ’  ho  — ♦  G(0)| . . .  hn-1  — ►  G(n  -  1)]]] 

)  channel  (G',G)  . 

Notice  that  each  new  process  is  less  complicated  than  the  original.  The  first  process 
performs  a  general  communication  action,  while  the  second  process  evaluates  the  guard 
set.  Again,  no  concurrency  is  introduced  by  this  transformation.  The  evaluation  of 
the  guards  7 ,•  in  the  second  process  completes  before  a  statement  A  initiates.  This 
follows  from  the  semantics  of  the  general  communication  action. 

To  precisely  describe  the  transformations  that  follow,  we  use  quantification  instead  of 
abbreviated  enumeration  to  denote  structures  of  indefinite  size.  Using  quantification 
notation,  the  above  decomposition  becomes: 

Kl  *  :  0  <  *  <  n  ::  m  — *  A)] 
t>  active  G'(l,n)  passive  G(n,  1) 

(G' :  [(|  *  :  0  <  *  <  n  ::  i  — ■»  A)] 

II  *[[<2  — ^  [(|  * :  0  <  i  <  n  ::  7,-  — ►  G(*)>]]] 

)  channel  (G',G)  . 

Again,  we  write  the  final  form  of  this  decomposition  as: 

G' :  [(|  * :  0  <  *  <  n  ::  i  — >•  A)]  ||  (G(n,  1)/(|  t :  0  <  i  <  n  ::  7,-))  . 

The  G(n,  1)  denotes  the  name  and  size  of  the  passive  port  used  in  the  decomposition. 
In  this  case,  the  number  of  output  values  is  n  (one  per  guard)  and  the  number  of  input 
values  is  1. 

3.2  Target  Language  of  the  Transformations 

The  target  language  of  the  program  transformations  is  a  slight  extension  of  the  above 
source  language.  Because  of  process  decomposition,  a  restricted  form  of  shared  vari¬ 
ables  and  shared  ports  is  allowed.  Processes  may  share  ports  and  variables  if  references 
to  these  objects  are  not  made  concurrently.  Also,  concurrent  execution  of  multiple 
statements  is  allowed  and  denoted  by  the  comma.  For  example,  ce;  A,  A >7  denotes 
the  execution  of  a,  followed  by  the  concurrent  execution  of  A  and  A>  and,  finally,  the 
execution  of  7. 

Programs  translated  into  this  language  are  written  in  a  different  typeface  than  source 
language  programs.  This  distinction  is  not  necessary,  but  serves  as  an  aid  in  describing 
which  syntactic  forms  have  already  been  or  have  yet  to  be  translated.  For  compatibility 
with  the  notation  of  previous  papers  [4, 5],  we  use  overlines  to  denote  probes  (X)  and 
up  and  down  arrows  to  denote  assignment  (x|,  xj.). 
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2.  Process 

6.  Sequence 

7.  Skip 

8.  Assignment 

9.  Com 

10.  Selection 

13.  Control 

14.  Seq  guards 


14.  Con  guards 

14.  Conjunction 

15.  Seq  AND 
15.  Con  AND 

17.  Negation 

19.  Variable 

20.  Probe 

21.  True 


9' 

*[P7  — ►  a1;^2  ;9]] 

*[9[ 

*[[9 — ►  *t;9]] 

:  [(I  ft  :  o  <  k  <  n  ::  lb  —  A'k)]-,Q}] 

*[[9  *  G'  :  [1  — ►  Q|0  — >  skip]]] 

*[[Q  *  G’ :  [0  — »  Q(0)|<|  *:  1  <*’<„::  *  *  A{;0(1))]]] 

—  [1  —  9(1) 

|0  — ►  J*  :  [{|  *  :  2  <  *  <  n  —  1  — >■  Q(i)) 

|o  — Q(o) 

]]]] 

*[[9  — ♦  (>  *  :  0  <  *  <  n  ::  GJ  :  [1  — >  a*!  |0  — >  skip]); 

_  [{|  *  :  0  <  <  <  n  ::  xt  — >  Q(i);  *4)]]] 

*[[9  A  *1  A  ...  A  ->Xk  A  . . .  — ►  9(0]] 

*[[Q  — >  Gi  :  [1  —  G'2  :  [1  — >  Q(1)|0  —  Q(0)]|0  ->  9(0)]]] 
*[[9  —  G[  :  [1  —  *4  |0  —  Si|],G'2  :  [1  — >  ar2 1  |0  — >  x2|]; 

[xi  a  x2  — *  9(i)ha;i v  -,*2  — *•  9(o)]]] 

*[[9  —  G':[l  —  9(0)|0  —  9(1)]]] 

*[[Q  a  — *  9(i)|9_a  ix— ►  9(0)]] 

*[[9  a  x  *  9(019  a  ->x  — >  9(0)]] 

*[9(i)] 


Figure  3:  The  above  basic  process  types  are  generated  as  the  result  of  the  program  trans¬ 
formations.  Each  process  corresponds  to  a  syntactic  construct  and  is  readily  compiled  into  a 
self-timed  circuit. 


3.3  Compilation  of  the  Basic  Constructs 

Figure  3  displays  all  of  the  basic  processes  produced  by  the  program  transforma¬ 
tions  described  in  the  next  chapter.  The  remaining  step  is  to  compile  these  basic 
processes  in  self-timed  circuits.  These  compilations  are  straight-forward  applications 
of  the  methods  described  in  [4,5] .  When  possible,  reshuffling  is  performed  on  passive 
communications  introduced  by  process  decomposition.  The  complete  compilations  are 
described  in  [1]. 

Both  the  program  transformations  and  the  resulting  circuits  for  each  basic  process 
are  described  together  succinctly  as  translation  rules  in  circuit  form.  These  rules  are 
shown  later  in  the  text  as  Figures  4,  5,  and  6. 


4  Transformations  Rules 

We  now  derive  the  program  transformations  corresponding  to  each  syntactic  construct. 
The  equation  numbers  used  to  identify  the  transformations  correspond  to  the  numbers 
used  in  Figures  1  and  3.  Several  of  the  BNF  rules  are  only  used  to  define  precedence. 
We  do  not  define  program  transformations  corresponding  to  these  rules. 
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4.1  Processes,  Declarations,  and  Channels 


No  transformations  are  applied  to  the  parallel  composition  of  processes  or  to  the 
declaration  of  ports  and  variables.  The  only  transformation  needed  in  the  first  five 
BNF  rules  involves  rule  2.  For  compatibility  with  the  following  transformations,  one 
process  decomposition  is  applied  so  that  all  (sequence)  forms  are  guarded  by  a  passive 
communication: 

(process)  >  {(port)}{(var)}(sequence)  .  . 

>{(P°rt)}{(var)}(<5'  ||  (<?/ (sequence)))  .  ^ 

The  basic  process  Q'  performs  exactly  one  active  communication;  thus,  (sequence) 
also  is  executed  exactly  once. 

4.2  Sequencing 

The  sequential  composition  of  a  (statement)  and  a  (sequence)  is  transformed  by  pro¬ 
cess  decomposition  into  the  sequence  of  two  active  communications  and  a  process 
implementing  each  statement: 

(Q/ (sequence))  >  (Q/ (statement),;  (sequence) 2)  .  . 

>*[[g — >  A’^A'^Q]]  ||  {Ai/ (statement)  J  ||  (^/(sequence).,)  .  ^  ^ 

4.3  Skip 

The  skip  statement  is  implemented  as  the  infinite  repetition  of  a  passive  communica¬ 
tion: 

(<5/ (statement))  >  (Q/skip)  i>  *[[Q  — ►  skip;  Q]]  >  *[[Q  — ►  Q]\  >  *[Q]  .  (7) 

The  probe  of  a  passive  communication  is  always  a  precondition  to  performing  the 
action,  so  we  may  remove  the  selection  statement  with  guard  Q. 

4.4  Assignment 

The  assignment  statement  decomposes  into 

(P/ (statement))  >  (P/(NAME)  up)  >  *[[P  — >•  xti-P]]  >  (8) 

a  simple  process  implementing  a  register.  The  name  x  represents  an  arbitrary  (NAME) . 
A  similar  decomposition  is  applied  to  assignments  of  false. 

4.5  Communication 

By  applying  the  BNF  rules  corresponding  to  communication,  we  get 

{Qj (statement))  >  ( Q/L(j )  :  [(|  k  :  0  <  k  <  n  ::  k  — >  (sequence),.)])  ,  (9) 
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where  L  and  j  represent  an  arbitrary  (NAME)  and  (INT),  respectively  (rules  9,  11 
and  12).  Process  decomposition  produces  new  processes  to  implement  each  (sequence) , 
yielding: 

*[[Q  — ►  L(j)  :[{\  k:0<k<n::k  — »•  A'*)];  <?]] 

||  (||  k  :  0  <  k  <  n  ::  (A*/ (sequence) k))  . 


4.6  Selection  and  Repetition 

To  derive  the  implementation  of  the  control  structures,  we  first  review  the  semantics 
of  these  constructs.  Operationally,  the  execution  of  the  selection  statement  can  be 
described  as:  Repetitively  evaluate  each  guard  until  one  or  more  is  true,  then  pick  a 
true  one  and  execute  the  corresponding  command.  The  program  transformation, 

*[[<?  - *  hi  - ►  01 1  •  •  •  |Tfn  - >  Pn]’,Q]] 

>*[[Q  — *  hi  — ►  Pi;  Q\  •  •  •  \in  — » Pn\ Q\  A i  — *  skip]]]  , 

does  not  change  the  meaning  of  the  selection  statement,  but  makes  it  easier  to  im¬ 
plement  because  at  least  one  guard  will  evaluate  to  true.  Similarly,  the  repetition 
statement  may  be  transformed  into: 

*[[g— ^*[<71  - *Pi\...\ln  >  Pn];Q]} 

>*[[Q  *  [ll  - *Pl\...\ln  >•  Pn\ A  i  ~n i  — ♦  Q]]]  • 

The  new  forms  for  selection  and  repetition  are  similar.  Only  the  position  of  the 
communication  Q  is  different.  We  can  perform  a  general  process  decomposition  on 
both  the  selection  and  repetition  forms  and  use  the  same  implementation  of  a  guarded 
command  set: 


(Q/ (statement))  >(Q/[(gcs)]) 

>»[[Q  —  G’ :  [1  — .  Q|0  — ,  skip  ]]]  ||  (G( 2,  l)/(gcs»  , 

where 


(G(24)/(gcs))  >  (G(2,l)/(|  *  :  1  <  *  <  n  ::  (expr)J.--> (sequence),.)) 

>*[[G  — »■  |(|  i  :  1  <  *  <  n  ::  (expr),.  — >  (sequence),.;  G(l)) 

| (A  * :  1  <  i  <  n  ::  -i(expr),.)  — >  G(0) 

]]]  • 

The  value  returned  by  the  communication  G  denotes  the  result  of  evaluating  the  dis¬ 
junction  of  the  guards  within  the  guarded  command  set.  Notice  that  in  the  selection 
statement,  the  guarded  command  set  is  reevaluated  if  a  false  value  is  returned.  Rep¬ 
etition  is  the  opposite  of  selection.  Reevaluation  occurs  if  the  guarded  command  set 
returns  true  as  described  by  the  basic  process: 

*[[Q  — >  G'  :  [0  — ►  Q\1  — y  skip  ]]]  . 
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4.7  Guarded  Command  Sets 


The  guarded  command  set  process  is  decomposed  into  a  control  process  that  sequences 
guard  evaluation  and  the  associated  command  execution,  a  set  of  processes  that  im¬ 
plement  the  commands  and  a  process  that  evaluates  the  guard  set: 

(Q(2,  l)/(gcs))  >  (Q(2,l)/(|  i  :  1  <  i  <  n  ::  (expr)f — > (sequence) ,.)) 

>*R  — >  G' :  [0  — ►  <3(0)|(|  i  :  1  <  i  <  » ::  i  — >  A!,-,  Q(l)>]]] 

II  (||  *  :  1  <  *  <  »  ::  (A,/ (sequence) (.))  '  ' 

jj  (G(n  +  1,1)/(|  i  :  0  <  *  <  n  ::  (expr)f.))  . 

(Rules  13  and  14  are  applied  here.)  The  control  process  provides  a  separation  between 
the  issues  of  guard  evaluation  and  statement  execution  by  storing  the  guard  that 
evaluated  to  true.  This  process  distinguishes  between  the  program  state  prior  to  the 
guarded  command  set  and  the  program  states  following  the  arrows  in  each  guarded 
command.  Guard  evaluation  completes  before  subsequent  statements  change  variable 
values.  The  guard  set  process  includes  (expr)0,  the  negation  of  the  disjunction  of  all 
the  other  expressions. 

The  non-trivial  translation  rules  corresponding  to  the  BNF  rules  1-12  are  shown  in 
Figure  4.  The  remainder  of  the  paper  is  concerned  with  the  compilation  of  the  guard 
set  process. 

4.8  Guard  Set  Evaluation 

The  semantics  of  the  language  does  not  specify  the  order  in  which  to  evaluate  the 
guards.  Whereas  the  other  constructs  require  a  strict  ordering  between  command 
executions,  concurrency  may  be  exploited  in  guard  evaluation.  Because  of  the  potential 
gains  of  concurrency,  there  is  no  single  best  scheme  for  guard  evaluation.  Instead, 
depending  on  the  syntactic  structure  of  the  guard  set  and  on  invariant  properties 
of  the  original  program,  different  evaluation  schemes  will  produce  the  most  efficient 
implementation.  Of  the  four  schemes  we  describe,  one  is  entirely  sequential,  while  the 
other  three  represent  different  methods  for  using  and  controlling  concurrency. 

All  four  decomposition  schemes  require  that  the  guard  sets  be  in  special  forms.  The 
special  forms  consists  of  both  syntactic  and  invariant  properties.  For  each  property, 
we  define  a  program  transformation  that,  from  an  arbitrary  guard  set,  produces  an 
equivalent  set  in  which  the  property  holds.  We  choose  to  define  both  the  properties 
and  transformations  because  often  a  programmer  can  establish  the  properties  (in  par¬ 
ticular,  the  invariants)  by  more  subtle  transformations.  An  automatic  compiler  can 
bypass  these  transformations  if  the  programmer  specifies  in  the  source  program  that 
the  desired  properties  are  satisfied.  We  now  define  some  properties  and  transforma¬ 
tions  on  the  guard  set  process, 

(' Q{n  +  1,1)/(|* :  0  <  i  <  n  ::  (expr)f))  . 
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10.  Repetition 


qi  A  q0o*  ♦  qlo  13.  Control 


Figure  4:  Translation  Rules  Excluding  Those  for  Guard  Set  Evaluation 


4.8.1  Syntactic  and  Invariant  Properties 


Mutual  Exclusion  A  guard  set  is  exclusive  if,  when  it  is  evaluated,  at  most  one 
guard  is  true.  This  property  is  expressed  by  the  invariant, 

-iQ  V  (A  i,j  :  0  <  i,j  <  n  A  t  ^  j  ::  (-i(expr)t.  V  -i(expr)y))  . 

The  invariant  can  always  be  achieved  by  successive  strengthenings  of  the  guards,  which 
will  produce  an  entirely  deterministic  implementation  of  the  original  non-deterministic 
guard  set.  Precisely,  for  1  <  i  <  n, 

(expr)'  =  (A  j  :  1  <  j  <  i  ::  -i(expr)J.)  A  (expr)t- 

and 

(expr)o  =  (expr)0  . 

(Read  *=’  as  “is  defined  as” .) 

Disjoint  Disjunctive  Form  Two  of  the  implementation  schemes  require  that  each 
guard  be  in  so-called  “disjoint  disjunctive  form” .  Each  guard  must  be  expressed  in 
AND-OR  form,  and  when  the  guard  set  is  evaluated,  at  most  one  AND  term  is  true. 
That  is,  a  guard  set  is  ddf,  if  for  each  expression  (expr)  . ,  0  <  i  <  n, 

-‘Q  V  (A  j,  k  :  0  <  j,  k  <rriiAj  ^  k  ::  (-i(conj)?  V  ->(conj}*))  , 

where  (expr)(.  =  (V  j  :  0  <  j  <  m,-  ::  (conj)^)  and  each  (conj)J  is  a  simple  conjunction 
of  possibly  negated  variables  and  probes. 

The  program  transformation  to  achieve  the  ddf  property  is  similar  to  the  transforma¬ 
tion  of  an  arbitrary  expression  into  disjunctive  normal  form,  except  that  the  conjuncts 
must  be  successively  strengthened  to  achieve  disjointness.  See  [1]  for  details.  Notice 
that  this  transformation  potentially  increases  the  size  of  a  guard  set  to  an  exponential 
in  the  number  of  its  variables. 


Negated  Probes  An  expression  is  stable  if  once  it  becomes  true,  it  remains  true. 
The  underlying  compilation  method  allows  only  restricted  implementations  of  non¬ 
stable  expressions.  Since  process  decomposition  does  not  introduce  concurrency,  the 
guard  set  process  is  not  active  concurrently  with  any  processes  modifying  variables; 
thus,  all  variables  are  stable  in  the  guard  set  process.  All  positive  probes  axe,  as  well, 
but  negated  probes  are  not.  A  negated  probe  may  change  asynchronously  from  true 
to  false.  We  call  a  guard  set  noneg  if  it  contains  no  negated  probes. 

Any  expression  containing  a  negated  probe  is  potentially  non-stable.  We  define  a 
transformation  that  stablizes  all  negated  probes.  Each  probe  in  the  guard  set  is 
evaluated  and  assigned  to  a  local  variable  before  the  boolean  expressions  are  evaluated. 
References  of  a  probe’s  value  within  the  guard  set  refer  to  the  corresponding  variable’s 
value. 
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Let  X  be  the  set  of  all  negated  probes  named  in  the  guard  set.  To  transform  the  guard 
set  into  noneg  form, 

{Q(n  +  1,1)/(||  * :  0  <  t  <  (expr),.)) 

>*[[£  — +  (,  X  :  X  G  X  ::  [X  — »  *t  bX  — ►  x  j]); 

G' :  [{|  t :  0  <  *  <  n  ::  i  — ►  Q(*))]]] 

||  (G(n  +  1, 1)/ (|  •  :  0  <  t  <  n  ::  (expr)J))  , 

where,  for  0  <  t  <  n, 

(expr)j  =  {X  :  X  G  X  ::  replace  X  by  x)(expr)f.  . 

Non-atomic  Evaluation  If  a  guard  set  is  not  evaluated  atomically,  expressions 
that  change  value  during  the  evaluation  cause  special  problems.  For  example,  the 
expression  X  V  ~>X  may  evaluate  to  false  if  different  values  for  X  are  used  in  the  two 
subexpressions.  A  guard  set  is  nonatomic  if  the  subexpressions  within  it  can  be 
evaluated  in  any  order.  This  property  is  achieved  if  each  probe  in  the  guard  set  is 
named  only  once.  The  same  transformation  used  to  achieve  the  noneg  property  will 
put  an  arbitrary  guard  set  in  this  form  if  we  let  X  be  the  set  of  all  probes  named  more 
than  once. 


4.8.2  Evaluation  Schemes 

Sequential  Guard  Evaluation  This  scheme  requires  that  the  guard  set  fulfill  the 
nonatomic  property.  The  guards  are  evaluated  one  by  one  until  one  evaluates  to  true. 
If  none  evaluate  to  true,  the  communication  Q(0)  is  performed.  Process  decomposition 
for  this  scheme  may  be  defined  recursively.  If  n  >  1, 

{Q(n  +  1, 1)/ (|  t :  0  <  t  <  n  ::  (expr)^) 

>*[[Q  — ►  G' :  [1  — y  Q(  1) 

|0  — >  P' :  |(|  t  :  2  <  *  <  n  ::  *  —  1  — ►  Q(0) 

|0  —  Q(0) 

]]]] 

||  (G(2,l)/(expr)1) 

||  (P(n,l)/ (expr)0  |  (|  t :  2  <  i  <  n  ::  (expr),.))  ; 
and,  if  n  =  1, 

(Q(2,l)/(expr)0  |  {expi)^  >  (g(2,l)/(expr)1)  . 

We  note  that  the  exclusive  property  is  not  required  for  this  decomposition.  Condi¬ 
tional  evaluation  ensures  mutual  exclusion  among  the  guards.  For  the  same  reason, 
(expr}0  is  not  used. 

Evaluation  of  each  individual  guard  is  implemented  conditionally.  Sequential  evalu¬ 
ation  of  the  and  connective  starts  by  evaluating  the  first  sub-expression.  If  the  first 
sub-expression  is  true,  the  value  of  the  second  sub-expression  determines  the  value  of 


12 


conjunction.  Otherwise,  the  value  of  the  conjunction  is  false: 


(Q (2,1) /{conjunct})  >  (Q (2,1) /(primary )x  and  (conjunct)2) 
>*[[Q  —  G\  :  [1  — >  G'2  :  [1  — >  Q(1)|0  —  Q(0)]|0  —  Q(0)]]] 
II  {Gi (2,1) /(primary)!)  ||  (G2 (2, 1)/ (conjunct) 2)  . 


The  sequential  scheme  and  the  next  scheme  (concurrent-all)  share  the  same  decom¬ 
positions  for  the  remaining  expression  constructs.  De  Morgan’s  Law  allows  the  or 
connective  to  be  defined  in  terms  of  and  and  not: 

(Q(l»2)/(expr))  >  (^(^/(conjunct)!  or  (expr)2)  ^ 

>(Q(l,2)/not  (not  (conjunct) x  and  not  (expr)2))  .  ' 

Similarly,  false  is  defined  in  terms  of  true  and  not.  Negation  exchanges  the  results 
of  the  evaluation: 

(Q(l,2)/ (primary))  >  (Q(l,2)/not  (primary)!)  ,  . 

>*[[Q  — *  G' :  I0  — ♦  <3(l)|l  — *  Q(0)]]]  II  (G(l,2)/(primary)1)  . 

The  evaluation  of  a  simple  variable  is  implemented  by  the  process: 

(Q(l, 2)/<primary>)  >  (Q(1,2)/(NAME))  .  . 

>*[[5 — *[* — +Q(i)h* — >0(o)]]].  ^  } 

If  a  probe  is  named  only  once  in  a  guard  set,  the  evaluation  of  a  probe  is  implemented 
by  the  process: 

(Q(2,  l)/(primary))  >  (Q (2,1) /probe  (NAME))  ,  . 

>*[[Q  — *  [X  — >  Q( l)hX  —  Q(0)]]]  .  m 

The  evaluation  of  true  has  an  implementation  similar  to  that  of  skip: 

(Q(2, 1)/ (primary))  >  (Q(2,l)/true)  ,  . 

>*[[Q — ►  [true — ►  Q(l)|false — ►  Q(0)]|]  >  *[Q(1)]  .  '  ' 

Figure  5  shows  all  the  translation  rules  used  in  the  sequential  guard  evaluation  scheme. 


Concurrent-all  Guard  Evaluation  This  scheme  requires  that  the  guard  set  fulfill 
both  the  exclusive  and  the  nonatomic  properties.  Each  guard  is  evaluated  sepa¬ 
rately  and  concurrently.  The  variable  corresponding  to  the  true  guard  (exactly  one 
because  the  exclusive  property  holds)  is  raised.  When  all  guards  have  been  evalu¬ 
ated,  the  communication  corresponding  to  the  set  variable  is  performed,  and  then  the 
variable  is  reset: 

{Q{n  +  1)/ (|  » :  0  <  *  <  n  ::  (expr)t)) 

>*[[Q  — >•  {,  *  :  0  <  *  <  n  ::  G,-  :  [1  — ►  x,f  |0  — *  skip]); 

[(I  t :  0  <  i  <  n  ::  x,-  — ►  <3(0; x»i)]]] 

||  (||  *  :  0  <  i  <  n  ::  (G<( 2,  l)/(expr),.))  . 
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14.  Seq  guards  17.  Negation  21.  True  20.  Probe 


Figure  5:  Translation  Rules  for  Sequential  Guard  Evaluation 

To  evaluate  the  and  connective,  both  sub-expressions  are  evaluated  concurrently  and 
the  results  stored  in  variables.  The  conjunction  of  these  values  is  returned  by  the 
communication  Q : 

(Q (2,1) /(conjunct))  >  (<5(2,l)/(primary)1and(conjunct)2) 

>*[[Q  — ►  G[  :  [1  — >  Xit  |0  — >  Xll],G'2  :  [1  —  x2|  |0  —  x2  j]; 

[*i  A  x2  — ►  Q(  1) 
l-iXi  V  -ix2  — >  Q(o) 

]]] 

II  (<^i (2, 1)/ (primary)!)  ||  (G2(2, 1)/ (conjunct) 2)  . 

Figure  6  shows  the  translation  rules  corresponding  to  concurrent-all  guard  evaluation. 


Concurrent-one  Guard  Evaluation  In  this  scheme,  all  guards  are  evaluated  si¬ 
multaneously.  The  evaluation  of  each  guard  (in  fact,  each  conjunct  of  each  guard) 
is  implemented  by  a  separate  process.  For  this  decomposition  to  be  valid,  no  two  of 
these  processes  may  operate  concurrently.  This  is  ensured  by  both  the  exclusive  and 
the  ddf  properties.  After  decomposition,  each  remaining  basic  process  implements  a 
simple  conjunction  of  variables  and  probes: 

(Q(n  +  1, 1)/(|  i  :  0  <  *  <  n  ::  (V  j  :  0  <  j  <  mf  ::  (conj)^))) 

>(||  *  t0  <  i  <  n  ::  (||  j  j  0  <  *  <  m  ::  *[[<5  A  (conj)J  — »■  <5(*)]]))  . 

The  noneg  property  is  required  for  the  implementation  of  each  conjunction. 

Concurrent-one-wait  Guard  Evaluation  In  special  cases,  guard  evaluation  may 
be  implemented  as  above  without  performing  the  all-false  communication  Q(0).  This 
evaluation  scheme  is  possible  when:  i)  the  guard  set  is  part  of  a  selection  statement 
(no  repetitions);  ii)  the  guard  set  has  the  exclusive  and  ddf  properties;  and  iii)  after 
replacing  (expr)0  by  false,  no  negated  probes  are  named  in  the  guard  set. 
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Figure  6:  Translation  Rules  for  Concurrent-all  Guard  Evaluation 

4.8.3  Applying  the  Guard  Evaluation  Schemes  to  an  Example 

We  illustrate  the  four  different  schemes  for  decomposing  guard  sets  by  applying  these 
schemes  to  a  small  program  fragment.  Consider  the  program  fragment: 

...;[XAs — ►  ...|F — ►  . 

Syntax-directed  application  of  the  program  transformations  results  in  an  intermediate 
form  containing  the  process: 

(Q(3,1)/IAs  |  Y  |  i(IasVF))  . 

We  construct  implementations  of  this  guard  set  using  the  four  different  evaluation 
schemes.  These  circuits  are  shown  in  Figure  7.  The  number  of  two-input  operators 
required  for  each  implementation  is  used  as  a  general  space  comparison  between  the 
schemes.  For  operators  with  more  than  two  inputs,  each  extra  input  adds  A  to  the 
operator  count. 

Sequential  Since  the  guard  set  has  the  nonatomic  property  (excluding  the  all-false 
expression),  the  decomposition  is  straightforward,  requiring  no  initial  transformation 
of  the  guard  set.  The  resulting  circuit  requires  2  AND,  2  SYNC  and  1  OR  operators. 

Concurrent-all  We  first  put  the  guard  set  into  a  form  that  satisfies  the  exclusive 
property: 

X  A  s  |  ~>(X  A  s)  A  Y  |  ~>(X  Ai)  A  ~Y  . 

Since  X  and  Y  are  named  more  than  once,  this  transformed  guard  set  does  not  have 
the  nonatomic  property.  However,  in  this  scheme,  the  evaluation  of  common  sub¬ 
expressions  can  be  shared  between  guards.  In  particular,  probes  need  only  be  evaluated 
once  per  guard  set;  thus,  the  nonatomic  property  is  satisfied.  The  resulting  circuit 
requires  2  AND,  2  SYNC,  16 1  C,  and  7|  OR  operators. 
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Concurrent-one  In  this  case,  the  guard  set  must  satisfy  both  the  exclusive  and 
the  ddf  properties.  The  transformed  guard  set  is: 

x a s  |  -JC aFv!a-i3aF|  ->x a ->F v x a -.5 a ->y  . 

Notice  the  extra  literals  needed  to  ensure  disjointness  among  the  conjuncts.  Since 
negated  probes  are  named  in  this  guard  set,  the  value  of  each  probe  is  assigned  to  a 
variable,  thus  satisfying  the  noneg  property.  The  resulting  circuit  requires  2  SYNC, 
2  FF,  1  C,  4  OR  and  12 1  AND  operators. 


Concurrent-one-wait  In  order  to  use  this  scheme,  the  exclusive  property  must 
hold  on  the  original  guard  set,  that  is  ->Q  V  ~>X  V  -is  V  ~Y  must  be  an  invariant  of  the 
original  program.  If  this  is  the  case,  the  guard  set 

X  A  s  |  Y  |  false 

can  be  implemented  directly  without  any  transformations,  resulting  in  a  simple  circuit 
requiring  only  2\  AND  operators. 


4.8.4  Comparison  of  the  Guard  Evaluation  Schemes 

In  the  above  example,  the  sequential  and  the  concurrent-one-wait  schemes  produce 
the  most  efficient  circuits.  This  is  not  always  the  case.  For  other  guard  sets,  each  of 
the  schemes  can  produce  the  best  implementation.  We  discuss  pros  and  cons  of  the 
different  schemes. 


Sequential  The  sequential  scheme  provides  a  straightforward  implementation  of 
an  arbitrary  guard  set  in  space  that  is  proportional  to  the  size  of  the  guard  set’s 
representation  in  the  source  language.  Unfortunately,  because  of  its  sequential  nature, 
the  time  needed  for  evaluation  is  also  linearly  related  to  its  size. 


Concurrent-all  This  evaluation  scheme  offers  several  potential  advantages  over  the 
previous  one.  For  guard  sets  with  many  guards,  the  time  needed  to  evaluate  the 
guard  set  is  proportional  to  the  logarithm  of  the  number  of  guards.  The  ability  to  do 
common  sub-expression  elimination  at  no  cost  is  an  added  benefit;  however,  the  basic 
processes  have  larger  implementations.  While  this  scheme  has  the  best  asymptotic 
area-time  performance,  we  have  yet  to  find  an  application  large  enough  to  reap  its 
benefits. 


Concurrent-one  While  exponential  blow-up  may  occur  when  transforming  patho¬ 
logical  guard  sets  into  disjoint  disjunctive  form,  this  scheme  produces  the  smallest  and 
fastest  implementations  of  most  expressions  that  do  not  contain  probes. 
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Figure  7:  The  four  circuits  show  four  different  guard  evaluation  schemes  applied  to  the  guard 
set  X  A  s  |  Y .  From  top  to  bottom,  the  circuits  were  produced  by  the  sequential,  concur¬ 
rent-all,  concurrent-one,  and  concurrent-one-wait  schemes.  In  the  circuit  produced  by  the 
concurrent-all  scheme,  the  boxes  enclosing  the  A  symbol  represent  the  circuit  implementing 
the  and  construct,  as  shown  in  Figure  6. 


Concurrent-one-wait  In  the  cases  when  the  programmer  can  prove  the  exclusive 
property  without  introducing  negated  probes,  this  scheme  applies  and  provides  a  non¬ 
polling  implementation  that  does  not  dissipate  any  static  power.  Again,  exponential 
blow-up  may  occur,  but  typical  implementations  are  much  smaller  and  much  faster 
than  the  other  schemes. 


5  Automatic  Compiler 

We  have  constructed  an  automatic  compiler  which  applies  the  translation  rules  derived 
in  this  paper.  The  self-timed  circuit  description  produced  by  the  compiler  is  then 
used  as  input  by  an  automatic  place-and-route  tool  which  produces  a  standard  cell 
implementation  of  the  circuit  in  VLSI.  Using  this  completely  automatic  design  method, 
we  have  fabricated  a  functionally  correct  chip  implementing  a  worm-hole  message 
routing  system[2]. 

The  translation  method  produces  correct,  self-timed  implementations  of  arbitrarily 
large  concurrent  programs,  and  because  each  translation  rule  is  of  fixed  size,  the 
size  of  the  implementation  is  no  worse  than  linearly  related  to  the  size  of  the  source 
program.  The  translation  method  and  the  compiler  provide  a  constructive  proof  that 
this  design  methodology,  based  on  programs,  represents  a  practical  approach  to  the 
design  of  VLSI  systems. 
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